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SETS OF INTEGERS 



MICHEL RIGO 

Abstract. Let U he a, numeration system, a set X C N is [/-star-free 
if the set made up of the [/-representations of the elements in X is a 
star-free regular language. Answering a question of A. de Luca and 
A. Restivo [^o|, we obtain a complete logical characterization of the 
[/-star- free sets of integers for suitable numeration systems related to a 
Pisot number and in particular for integer base systems. For these latter 
systems, we study as well the problem of the base dependence. Finally, 
the case of fc-adic systems is also investigated. 



1. Introduction 

In the study of numeration systems, a natural question is to determine 
if a set of non-negative integers has a "simple" representation within the 
considered number system. Otherwise stated, is it possible for a given set 
X C N, to find a "simple" algorithm (a finite automaton) testing member- 
ship in X ? This question has given rise to a lot of papers dealing with the 
so-called recognizable sets. A subset X of N is said to be /c-recognizable if 
the language made up of the k-aiy expansions of all the elements in X is 
regular (i.e., recognizable by a finite automaton). 

Since the work of A. Cobham j^, it is well-known that the recognizability 
of a set depends on the base of the numeration system. If k and I are two 
multiplicatively independent integers then the only subsets of N which are 
simultaneously A;-recognizable and Z-recognizable are exactly the ultimately 
periodic sets. 

Among the recognizable sets, it could be interesting to describe the sets 
whose corresponding languages of representations belong to a specific sub- 
set of regular languages. Among the regular languages, the "simplest" are 
certainly the star-free languages because the automata accepting those lan- 
guages are counter-free. Having in mind this idea of "simpler" representation 
of a set, A. de Luca and A. Restivo have considered in the problem of 
determining the existence of a suitable base k such that the fc-ary represen- 
tations of the elements belonging to a set A C N made up a regular language 
of (unrestricted) star-height (such a set is then said to be k-star-free). One 
of the main results of [^] is that if a Z-recognizable set X is such that its 
density function is bounded by c(logn)'^, for some constants c and d, then 
there exists a base k such that X is fc-star-free. 

The star-free languages have been extensively studied in the literature 



12, 14, 15, nM. In particular M.P. Schiitzenberger has shown that the star- 



free languages — i.e., the languages expressed in terms of extended regular 
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expressions without the star operation — are exactly the aperiodic languages 
1 14]. We recall that a language L C S* is aperiodic if there exists a positive 
integer n such that for all words u,v,w G S*, 

uv'^w e L 4^ uv'^'^^w G L. 

In the present paper, we answer some of the remaining open questions 
adressed in about sets of integers having a representation of star height 
0. Especially, we give a complete characterization of the /c-recognizable sets 
such that the language of fc-ary expansions is aperiodic. To obtain this result, 
we use the first-order logical characterization of the star-free languages given 
by R. McNaughton and S. Papert 

In the first two sections, for the sake of simplicity we consider the case of 
the binary system. Next, we show how our results can be extended not only 
to the k-avy systems but also to numeration systems defined by a linear re- 
current sequence whose characteristic polynomial is the minimal polynomial 
of a Pisot number (a Pisot number is an algebraic integer 6 > 1 such that 
the other roots of the minimal polynomial of 9 have modulus less than one). 
In this wider framework, we have to consider the additional assumption that 
the set of all the representations computed by the greedy algorithm is star- 
free. For instance, this assumption is satisfied for the Fibonacci system. In 
Section |5|, we consider the problem of the base dependence of the aperiodic- 
ity of the representations for integer base systems. The obtained result can 
be related to the celebrated Cobham's theorem: only ultimately periodic 
sets can be k- and /-star-free for two multilplicatively integers k and / but if 
the period p of an ultimately periodic set is greater than 1 then this set is 
A;-star-free for some k depending on p but not for all /c € N. In particular, 
we show that a set is A:-star-free if and only if it is /c^-star-free for any n > 1. 
In the last section, we consider the case of the unambiguous A:-adic numer- 
ation system. It is worth noticing that the unique fe-adic representation of 
an integer is not computed through the greedy algorithm and therefore this 
system differs from the other systems encountered in this paper. It appears 
unsurprisingly that the star-free sets with respect to this latter system are 
exactly the /c-star-free sets. 

In the following, we assume the reader familiar with basic formal lan- 
guages theory (see for instance, Q). Finite automata will be denoted as 
5-tuples A4 = {Q, qo, F,T,, 6) where Q is the set of states, qo is the ini- 
tial states, F Q is the set of final states, S is the input alphabet and 
(5 : Q X S — > Q is the transition function. 



2. Logical characterization of star-free languages 

Let us consider the alphabet E2 = {0, 1}. A word w in Sg" can be iden- 
tified as a finite model Tl^ = (M, <,Pi) where M = {1,... {\w\ is 
the length of t;;), < is the usual binary relation on M and Pi is a unary 
predicate for the set of positions in w carrying the letter 1. For our conve- 
nience, positions are counted from right to left. As an example, the word 
w = 1101001 corresponds to the model (M, <,Pi) where M = {1,... ,7} 
and Pi = {1,4, 6, 7}. For further purposes, as in ||l^ we expand this model 
with its maximal element max (in the latter example, max = 7) — notice 
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that max is definable in terms of <. So to each nonempty word in is 
associated a model (M, <, Pi, max). 

A language is said to be star-free if it is obtained from finite sets by a finite 
number of Boolean operations (union, intersection and complementation) 
and concatenation products. McNaughton and Papert have shown that 
these languages are exactly those defined by first-order sentences when words 
are considered as finite ordered models |12| (a sentence is a formula whose all 



variables are bound). As an example, the language 1"'"0* is star- free because 
if we denote by X the complement S2 \ X of X then 



1+0* = {1}0{O}0 0{1}0 where = {0} n {00} 

and this language is also defined by the formula 

(1) (3x) [Pi (x) A (Vy ) (x < y ^ Pi (y ) ) A (Vy ) (y < X ^ -Pi (y ) )] . 

The language of all the formulas defining star-free languages will be denoted 
by CsF (if necessary, to recall the alphabet S2 we can write Csf,2)- Notice 
that with these finite models, we are not considering the empty word. 

To be precise, if ^p{xq, . . . , x„) is a formula having at most xq, ... , x„ as 
free variables, the interpretation of in a word-model OJl^; having M as 
domain and , . . . , r„ as M-elements is defined in a natural manner and we 
write dJlw \= '^[^Oi ■ ■ ■ j ?"n] if is satisfied in OJt^ when interpreting Xj by r^. 
The language defined by a formula (p is 

{u; G 1;+ I ^ tp}. 

2.1. Syntax of logical formulas in Csf- The first-order language of the 
finite ordered models representing words is defined as follows. The variables 
are denoted x,y,z, . . . and are ranging over M-elements. The terms are 
obtained from the variables and the constant max. The atomic formulas 
are obtained by the following rules: 

1. if Ti and T2 are terms then ri < T2 and ri = T2 are atomic formulas 

2. if r is a term then Pi(t) is an atomic formula. 

Finally, we obtain the set £.3 F of all the formulas by using the Boolean con- 
nectives -1, A, V, — >, and the first-order quantifiers (3x) . . . and (Vx) . . . 
where x is a variable. 

Notice that for convenience we are somehow redundant in our definitions, 
ipy ip stands for -^{^ip A -^tp), x = y stands for -■((x < y) V (y < x)), ip ^ ip 
stands for -^ip V ip and ip <^ tp stands for [ip ^ tp) A {ip ip) . We also write 
X < y for (x < y) V (x = y). 

It is worth noting that in this formalism we can define y = x + 1, where 
X and y are variables, 

y = x + l = (x<y)A {yz){x < z ^ y < z) 

but the form z = x -|- y is not allowed, if x, y and z are variables p 



3. Logical characterization of recognizable sets of integers 

In the present section, we consider the binary numeration system. If x is 
an non-negative integer, the binary expansion of x computed through the 
greedy algorithm (the normalized 2 -representation of x) is denoted P2{x) 
(for a presentation of the greedy algorithm, see H). Notice that p2{0) is the 
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empty word e and we allow leading zeroes in normalized 2-representations. 
Thus, the set of all the normalized representations is 

A/'2 = 0*{p2(n) [neN}. 

A set X of integers is said to be 2-recognizable if the set P2iX) of the 
normalized 2-representations of all the elements in X is a regular language. 

Remark 1. Allowing leading zeroes does not change the star- free behavior 
of a language made up of representations. Indeed, let Sfc = {0, . . . , A; — 1}, 
k >2 and L C \ OS^ be a regular language consisting of words which do 
not begin with 0. Then L is star-free if and only if 0*L is star-free. Indeed, 

0*L = 0{1, ... ,k- 1}0L and L = 0*L\ (00). 

Definition 2. A set X C N is said to be 2-star-free if P2(^) (or equivalently 
0* P2{X)) is a regular aperiodic language. 

It is well-known that the 2-recognizable sets are exactly those definable 
in the first-order structure (N, + , V2) (see Theorem 6.1] or [l^) where 
V2{x) is the greatest power of 2 dividing x (and V2(0) is 1). Thus X C N is 
said to be 2-definable if there exists a formula (p of (N, -|-, V2) such that 

X = {n EN I (N,+,y2) h^in)}. 

Instead of V2{x), we shall use the binary relation e2{x,y) defined by "y 
is a power of 2 occurring in the normalized 2-representation of x". As an 
example (74,8) belongs to €2 because /92(74) = 1001010 but (74,16) and 
(74, 31) do not. Thus we can write 

x= ^ y. 

Observe also that {x, x) belongs to €2 if and only if x is a power of 2. 

Remark 3. The structures (N, -|-, V2) and (N, -|-, £2) are equivalent (i.e., for 
any formula ip{n) of (N, -|-, V2) there exists a formula ip'{n) of (N, -|-, €2) such 
that {n G N I (N,-F,y2) N '^(n)} = {n G N | (N,-h,e2) ^ ip'in)} and 
conversely). Indeed, e2{x,y) is defined in (N, -|-,V2) by 

{V2{y) = y)A {3z){3t){x = t + y + zAz<yA{y< V2{t) V t = 0)) 

and V2{x) = y is defined in (N, + ,62) by 

e2{x,y) A {yz){e2{x,z) ^y <z). 

To be complete, notice that the binary relation < is definable in the Pres- 
burger arithmetic (N, -|-) by 

X < y = {3z){^{z = 0) Ay = X + z). 

For our purposes, we introduce a subset C2,n of formulas <^(n) in (N, -|-, €2) 
defined as follows. 
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3.1. Syntax of logical formulas in £2,n' The waria6/es are ranging over N 
and denoted b, n, x,y,z, . . . (when specified, b and n have some special role). 
Roughly speaking n is dedicated to be the only free variable and b plays the 
role of an upper limit for all the bound variables occurring in a formula. 
The only terms are the variables. The atomic formulas are obtained with 
the following rules: 

1. If X and y are variables b,n) then x < y and x = y are atomic 
formulas. 

2. if x is a variable (7^ b,n) then e2{n,x) is an atomic formula. 
If is a formula whose x is a free variable (x 7^ 6, n) then 

{3x)2^Lp = {3x){e2{x, x) Ax <b Aip) 

and 

{yx)f''(f = (Vx)(e2(a;, x) Ax <bA(p) 

are formulas. To obtain formulas, we can also use the usual Boolean con- 
nectives -1, A, V, — >, ^ either for formulas or atomic formulas. We are now 
able to define >C2,n- If is a formula in which the only free variables are 
(possibly) n and b then 

{3b){e2{b,b)Aip) 
is a formula of /^2,n having (possibly) a single free variable n. 

Example 4. The formula f{n) given by 

ifin) =(36){e2(6, &) A (3x)f [e2(n, x) A 

(Vy)f (x < y ^ e2(n,2/)) A (Vy)f (y < x ^ -e2(n,2/))]} 

belongs to £2,71- We shall see that the set X = {n \ (N, +,£2) |= ^(n)} 
is such that P2{X) = l+O*. Thus ip{n) actually defines a 2-star-free set of 
integers. As another example, the set Y of the powers of 2 is 2-star-free 
because P2iY) = 10* and it is also definable in C2,n by the formula 

(3) V(n) ^ mMb,b) A (3x)2<''(62(n,x) A (yy)<\e2{n,y) ^ x = y))]. 

With this definition of C2,n, we obtain quite easily the following result. 

Theorem 5. A set X C N is 2-star-free (i.e., P2{X) is regular aperiodic) 
if and only if X is definable by a first- order formula of C2^n- 

Proof. Let us first show that the condition is sufficient. Let X C N be a set 
defined by a formula ip of >C2,n- This formula is of the form 

i^={3b){e2{b,b)Aip) 

and we can assume that V has n as only free variable. (If is a, sentence, 
then X is equal to N or and the result is obvious.) Let us now proceed to 
some syntaxical transformations. In Tp, we keep only 93 in which we replace 
each occurrence of e2(n,x), (Vx)2 and (3x)^ with respectively Pi(x'), (Vx') 
and (3x') . The remaining variables x are naturally replaced with x'. It is 
clear that the obtained formula (p' has no free variable and belongs to Csf- 
Indeed, n appears in ip only through terms of the form e2(?^, x). (As an 
example, the reader can consider the formulas (|2|) and (jl]).) Therefore, (p' 
defines a star-free language L over {0, 1}. To conclude this part of the proof, 
we have to show that P2{X) = L. Let n be such that (N, -|-,e2) \= ^{n). 
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Assume that e2{n,x) appears in ip with x = 2 for some / < log2 ^ because 
X is within the scope of a quantifier (Va;)2 or (3x)^. It means that P2{n) 
has a 1 in the (/ + l)th position (counting positions from right to left and 
beginning with 1). In ip' corresponding to e2{n,x), we have Pi{x') which 
means that the model of a word — i.e., the representation of an integer — 
satisfying ip' has a 1 in position x'. Thus, we obtain the result when x' is 
identified as 1 + log2 x. The upper limit in ip given by b and appearing in 
the quantifiers (Vx)^ and (3x)^ is clearly understood in (p' since in Csf we 
consider words as finite models. It is the reason for removing the first part 
of Ip to obtain ip' and the constant max can be identified as log2 b. 

Let us now assume that X is a 2-star-free set. By McNaughton and 
Papert's theorem, p2iX) is defined by a sentence ip in Csf where the bound 
variables are denoted x,y,z,... n,b). In ip, we replace (Vx), (3x) and 
Pi(x) with respectively (Vxjg , (3x)^ and 62(71,2;) to obtain a formula ip' . It 
is clear that 

V^ = (36)(e2(6,6) AV'O 
is a formula of £2,n and has possibly n as single free variable. To conclude 
the proof, it is clear that 

X = {nGN| (N,+,e2) hH^)}- 

One can view b as 2™"^ if max is a constant of the finite model associated 
to a word. □ 

Example 6. The set 10* can be defined by the following formula of Csf 

i3x){Pi{x)A{yy){Pi{y)^x = y)). 

The reader can check that this formula corresponds exactly to the formula 
(^) in £2,n if one proceeds to the transformations indicated in the second 
part of the proof. 

Remark 7. From the logical characterization of the 2-star-free sets given in 
the previous theorem, other equivalent models can be obtained. In [||, it is 
shown how a finite automaton Ai can be effectively derived from a formula 
ip of (N, +,V2) defining a 2-recognizable set X. Using classical results Q, 
it is also clear that the characteristic sequence of this X is 2-automatic and 
the morphisms generating it can be derived from Ai and thus from ip. 

4. Generalization to linear numeration systems 

For the sake of simplicity, we have up to now considered the binary numer- 
ation system but Theorem || can be extended to more general numeration 
systems. 

Definition 8. A linear numeration system [/ is a strictly increasing se- 
quence (f7n)nGN of integers such that Uq = 1, supi^ is bounded and 
satisfying for all n E N a linear recurrence relation 

Un+k = Ck~lUn+k-l H \- CoUn, Ci S Z, Cq / 0. 

By analogy to the binary system, the normalized representation of x is 
denoted by pu{x) (with leading zeroes allowed) and Vu{x) is the greatest 
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Un appearing in the greedy decomposition of x with a non-zero coefficient 
{Vu{0) = Uq = 1). A set X C N is [/ -recognizable if pui^) is regular. 

In the following, we shall only consider the class U of linear numeration 
systems (C/n)neN whose characteristic polynomial is the minimal polynomial 
of a Pisot number. For instance, the k-ary system and the Fibonacci system 
belong to U. The choice of the class U relies mainly upon the following 
result. UU = {Un) neN is a numeration system belonging to 14 then the U- 
recognizable sets are exactly those definable in (N, +, V^/) (see fl], Theorem 
16]). In fact, U is up to now the largest set of numeration systems having 
well-known and useful properties such as the recognizability of addition. 

Let U = {Un)nen G ^- Since sup %^ is bounded, the alphabet of the nor- 
malized representations is finite and is denoted Ajj = {0, ... , c}. Naturally 
words over Ajj will be interpreted as finite models (M, <, Pi, . . . , Pc, max) 
and the star-free languages are exactly those defined by first-order sentences 
in this formalism (the extension of Csf defined in Section |2.1| is left to the 
reader). As an example, ii w = 1230112 then Pi = {2, 3, 7}, P2 = {1, 6} and 
P3 = {5}. 

Instead of Vu{x), we shall use c binary relations ej^u{x,y), j = 1, . . . ,c, 
meaning that y is an element of the sequence {Un)neN appearing in the 
normalized decomposition of x with a coefficient j. Thus 

c 

x = Y^ jy. 

Remark 9. The structures (N, +,Vt/) and (N, +, ei,c/, . . . -,€0,11) are equiv- 
alent, ej^u{x,y) is defined by 

(Vuiy) = y) A i3t){3z){x = t+j.y + z/\z<y/\{y< Vu{t) V t = 0)) 

and Vuix) = y by 

(ei,c/(a;, y) V • • • V ec,u{x, y)) A (Vz)((ei,[/(a;, z) V • • • V ec,u{x, z)) ^y <z). 

By analogy to >C2,n introduced in the frame of the binary system, we can 
define a language Cu,n of formulas in (N, +, ei_[/, . . . , ec,u) having possibly 
a single free variable n. For instance, 

{3x)^ip = (3x)(ei^(/(x, x) A X < 6 A 93). 

The reader could easily make up the complete definition of Cu,n- 

Let us just introduce two notations, if 99 is any formula of Cu,n-, we shall 
denote by ^{^) the main part of the formula (i.e., the largest sub-formula 
in which b is still free), namely the formula is necessarily of the form 

ip={3b){ei^u{h,b)A^{^)). 

If pu{^) is aperiodic then it is definable by a sentence X of Csf- In we 
replace Pj{x), (Vx) and (3x) with ej_c7(n, x), (Vx)^ and (3a;)^ respectively 
to obtain a formula Xn being the main part of a formula in Cu,n- 

Theorem 10. Let U be a numeration system in lA. If Mu = 0*pc/(N) is 
aperiodic then a set X C N is U -star-free (i.e., pui^) is regular aperiodic) 
if and only if X is definable by a first- order formula of Cu,n of the form 

(36)(ei,c;(6,6) A*p((/p) AXn) 
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where (f is a first- order formula of Cij,n- 

Proof. The only differences witli tlie proof of Theorem || appear when we 
show that the condition is sufficient. Roughly speaking, we should have to 
be careful for the choice of a formula if in Cu,n because we want to obtain 
a corresponding formula in Csf valid only for normalized representations 
interpreted as finite models. To avoid this problem we use the formula Xpj. 
Let (p be any formula of Cu,n- It is necessarily of the form 

(36)(ei,c/(6,6)A*PM). 

Assuming that and ^(y?) have different variables except for n and h then 

^ = (36)(ei,[/(6,6)Aq3(99)AXN) 

is again a formula of Adding the part in such a formula '0 ensures 

that if we transform -0 into a sentence ij:' of Csf (following the scheme given 
in the proof of Theorem |5|) then the words satisfying •i/^' are all normalized 
[/-representations and the corresponding language is aperiodic. □ 

Remark 11. Notice that for the A:-ary system, the set of all the normalized 
^-representations (allowing leading zeroes) is aperiodic 

OVfc(N) = {0,... ,A;-1}* = 

and any word of is a valid normalized /c-representation. So in this special 
case, we do not need a formula Xn- To be precise, Xn is a tautology. In 
particular, this explains the simpler form of Theorem ^ which holds for any 
numeration system with an integer base k. 

Example 12. Let us consider the Fibonacci system given by Uq = \,Ui = 2 
and Un+2 = Un+i + Un- As a consequence of the greedy algorithm, 

A/1/ = 0110 

is aperiodic and defined by the following sentence of Csf 

X = (Vx)(Vy)[(3z)(x <z<y)V -(Pi(x) A Pi(y))] 

corresponding to 

Xn = (Vx)<(Vy)<[(3z)<(x <z<y)V -(ei,[/(x) A ei,c/(y))]. 

So any formula if of Cu^n gives a new formula 

(36)(ei,[;(6,6) A*p((/p) AXn) 

defining a [/-star-free subset of N (which could be finite or empty depending 
on the compatibility of the conditions given by ^(97) and Xn). 

Continuing this example, we show that the set of even integers is not 
[/-star- free although it is easily definable in the Presburger arithmetic by 

(p{n) = (3x)(n = X + x). 

Indeed, Un is even if and only if n = 1 (mod 3) and therefore, for any n, two 
but not the three words 1(01)", 1(01)"+^ and 1(01)""*"^ are in the language 
pu(^)- So the set of even integers is not definable in 
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5. Base dependence 

In this section, we consider once again integer base numeration systems 
and study the base dependence of the star-free property. We show that the 
sets of integers are classified into four categories. 

The proof of the first result in this section does not use the previous logical 
characterization of the p-star-free sets but relies mainly on automata theory 
arguments. 

Proposition 13. Let p,k > 2. A set X is p-star-free if and only if it 
is -star-free. 

Proof. Let us first show that if X C N is p^-star-free then X is p-star-free. 
Assume that Ppk (X) is obtained by an extended regular expression over the 
alphabet Spfc = {0, . . . ,p^ — 1} without star operation. In this expression, 
one can replace each occurrence of a letter j G S^fc with the word 0'^~'pp(i) 
(/ = \pp{j)\) of length k. Since we only use concatenation product, the 
resulting expression defining the language L C {0, ... ,p — 1}* is still star- 
free and it is clear that Pp{X) = L. 

Example 14. The set X = {3.4" | n G N} is 4-star-free, 

Pi{X) =3 0* = {3}0{1,2,3}0. 

The set X is also 2-star-free, we simply have to replace, 0, 1, 2 and 3 with 
respectively 00, 01, 10 and 11 and 

P2{X) = 11 (00)* = {11} 0{O1, 10, 11}0. 

Before continuing the proof, we recall another characterization of the 
star-free languages given by McNaughton and Papert. 

Definition 15. A deterministic finite automaton is permutation free if there 
is no word that makes a nontrivial permutation (i.e., not the identity per- 
mutation) of any subset of the set of states. In the same way, a language is 
said to be permutation free if its minimal automaton is permutation free. 

Example 16. The automaton depicted in Figure || is not permutation free. 
Indeed, the word 01 makes a non trivial permutation of the set {p, r}. 




Figure 1. A non permutation free automaton. 



Theorem 17. |12, Theorem 5.1] A language is star-free if and only if it is 
permutation free. 
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Let us also recall a well-known result from automata theory. 

Proposition 18. Section III. 5] Let L C Ti* be a regular language having 
■Ml = {Ql,Qo, Fit's, Sl) as minimal automaton. If A4 = {Q,qQ, F,Y!,,S) 
is an accessible deterministic automaton recognizing L then there exists an 
application $ : Q ^ Ql such that $ is onto and for each q (z Q and w G S*, 

Assume now that X C N is p-star-free. Using Remark |l] and Theo- 
rem |l^, 0*pp{X) is a permutation free language and we denote by 7W = 
{Q,qo, F,Tip,5) its minimal automaton. From A4, we build a new automa- 
ton M.' = {Q,qQ,F,T,pk,6') having the same set of states. The transition 
function 6' of A4' is defined as follows. For each j £ ^pk, P,q & Q, let 
w = Pp{j) where / = \pp{j)\, then 5'{p,j) = q if and only if 6{p,w) = q. 

Example 19. Let S2 = {0, 1} and consider the automaton A4 depicted 
in Figure |2[ If we consider the 4-ary numeration system, we build a new 

Q0,1 

Figure 2. An automaton M over 112. 

automaton M' depicted in Figure |3| by considering in Ai the paths of label 
00, 01, 10 and 11 replaced respectively by 0, 1, 2 and 3. 




Figure 3. The corresponding automaton M' over S4. 




The automaton Ai' is accessible. Let q (z Q. Since M is accessible, there 
exists a word w £ T,* such that 5{qo,w) = q. Observe that in ^A, we have 
a loop of label in the initial state go- So for any n G N, 6{qo,0'^w) = q. 
Choose n such that |0"t«| = ik. The word O^w corresponds to a word 
w' G (Spfc)* and 6'{qQ,w') = q. So Ai' is accessible. 

The automaton Ai' is permutation free. Assume the contrary. If there 
exists T Q and a word w' G such that w' makes a nontrivial permu- 
tation of T. Then w' corresponds to a word G S* producing a nontrivial 
permutation in the same subset T of the set of states of Ai. This is a 
contradiction. 

It is clear from the construction of Al' , that this automaton accepts ex- 
actly the language 0*Ppk{X). The only remaining problem before being able 
to apply Theorem 17 is that Ai' is not necessarily reduced. Indeed, due to 
its definition, through Ai' we only look in A4 at words of length ik, i > 0. 
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Let M" = {Q",qQ,F",Tjpk,6") be the minimal automaton of 0*Ppk{X). 
Thanks to Proposition |l8|, we have an onto apphcation $ between the au- 
tomata Ai' and A4" namely between the sets Q and Q" . To conclude the 
proof by applying Theorem |l^ we have to show that Ai" is permutation 
free. Assume that there exists a subset T of Q" and a word w G such 
that w makes a nontrivial permutation of T. So there exists a state q (z T 
such that 6"{q,w) € T and 6"{q,w) 7^ q. Therefore M' is not permutation 
free, w makes a nontrivial permutation of <I>~^(T) C Q. Indeed, there exist 
r G ^ <^''^{T) and s € <^-^{5"{q,w)) C <^-'^{T) such that r / s and 

6'{r,w) = s. This is a contradiction because we have shown previously that 
A4' is permutation free. □ 

It is well-known that any finite union of arithmetic progressions is p-star- 



free for some p (see |1C, Theorem 1.4]). So a natural question is to determine 
if an arithmetic progression r-|-sN = {r-|-sn|n€ N}, s > 1, is p-star-free 
for any p > 2 or only for some specific bases p. 

Example 20. The set of even integers is 2-star-free and therefore 2^-star- 
free for each n. But this set is not 3-star-free, indeed p3(2N) is the set of 
words over {0, 1, 2} having an even number of 1 (and therefore the minimal 
automaton of this language is not counter-free, which is another way to say 
that the language is not permutation free). Notice also that 2 and 10 are 
multiplicatively independent but 2N is 10-star-free. Actually, it is easy to 
see that the set of even integers is (2p)-star-free, for any p. So with this 
example, we see that we obtain a slighty different phenomenom that the one 
encountered in Cobham's theorem. 

A finite union of arithmetic progressions being ultimately periodic, we 
can always write it as uj^]^(rj -|- sN) U F where -F is a finite set and s is 
the l.c.m. of the periods of the different progressions. Since aperiodicity is 
preserved up to a finite modification of a language, we can forget the finite 
set F and assume that rj < s. Union of aperiodic sets being again aperiodic, 
we shall consider a single set r -|- s N. 

Proposition 21. The set r + sN, (with r < s and s > 1) is (is) -star- free 
for any integer i > 0. 

Proof. The reader can easily check that the language made up of the (zs)-ary 
expansions of the elements in r -|- s N is 

^is{r,r + s, . . . ,r+{i- l)s} 

which is a definite^ language. □ 

We even have a better situation. 

Proposition 22. Let r-\-sN be such that r < s, s > 1 and the factorization 
of s as a product of primes is of the form 

s =Pi ■■■Pk , ai> 

If P = Hj^iPj then r -\- sN is (iP)- star- free for any integer i > 0. 

language L C E* is said to be definite if there exist finite languages M and A'' such 
that L = N U E*M. So to test the membership of a word in L, we only have to look at 
its last letters. 
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Proof. Let a = supj^i^ aj. By definition of P, it is clear that (iP)""*"" is 
a multiple of s for all integers n > and i > 0. So in the (iP)-ary expansion 
of an integer the digits corresponding to those powers of iP provide the 
decomposition with multiples of s. To obtain an element of r + sN, we 
thus have to focus on the last a digits corresponding to the powers 1, iP, 
. . . of weakest weight. Consider the finite set 

y = {r + ns I n G N and r + ns < {iP)°'}. 

For each yj G Y , j = 1, . . . ,t, consider the word pip{yj) preceded by some 
zeroes to obtain a word y'^ G E*p of length a. To conclude the proof, observe 
that the language made up of the (ii-*)-ary expansions of the elements in 
r + s N is 

T.*p{y[,... ,y'^} 

and is a definite language. □ 



Remark 23. The situation of Proposition cannot be improved. Indeed 
with the previous notations, consider an integer Q which is a product of 
some but not all the prime factors appearing in s. For the sake of simplicity, 
assume that 

For each n, Q" ^ (mod s). Indeed, if = i s then 

n/32 i/3fe ■ «i Ok 

P2 ■ ■■Pk ='^Pl ■■■Pk 

which is a contradiction since pi does not appear in the left hand side factor- 
ization. Moreover, it is clear that the sequence {Qn mod s)nGN is ultimately 
periodic. Therefore PQ{r + sN) is regular but not star-free because, due to 
this periodicity, the corresponding automaton is not counter-free. As an 
example, one can check that 6N is neither 2-star-free nor 3-star-free. 

To summarize the situation, the sets of integers can be classified into four 
categories: 

1. The finite and cofinite sets are p-star-free for any p > 1. 

2. The ultimately periodic sets of period s = p^^ '''Pf' ^ ^ i'^^)' 
star-free for P = li^j^iPj and any z > 0. In particular, these sets are 
P"'-star-free for m>l. 

3. Thanks to Cobham's theorem, if a p-recognizable set X is not a finite 
union of arithmetic progressions then X is only p'^-recognizable for 
k > 1 {p being simple^). So if a p-star-free set X is not ultimately 
periodic then X is only p'^-star-free for k >! [p being simple). 

4. Finally, there are sets which are not p-star-free for any p > 1. 



^Being multiplicatively dependent is an equivalence relation over N, the smallest el- 
ement in an equivalence class is said to be simple. For instance, 2,3,5,6,7,10,11 are 
simple. 
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6. P-ADIC NUMBER SYSTEMS 

The p-aiy numeration system is built upon the the sequence Un = p" and 
the representation of an integer is a word over the alphabet {0, ... ,p — 1} 
computed through the greedy algorithm. On the other hand, the p-adic 
numeration system is built upon the same sequence but representations are 
written over the alphabet {!,... ,p}. It can be shown that each integer has 
a unique p-adic representation (see [|^] for an exposition on p-adic number 
systems). For instance, the use of p-adic system may be relevant to remove 
the ambiguity due to the presence of leading zeroes in a p-ary representation. 
Indeed, is not a valid digit in a p-adic representation (see for instance 
p. 303] for a relation to L systems). 

In this small section, we show that the p-star-free sets are exactly the sets 
of integers whose p-adic representations made up a star-free language. 

Capital Greek letters will represent finite alphabets. 

If A C Z is a finite alphabet and w = Wn ■ ■ ■ wq is a finite word over A, 
we denote by TTp{'w) the numerical value of w, 

n 
i=0 

For instance, 1001 and 121 are respectively the 2-ary and 2-adic representa- 
tions of 9, 

7r2(1001) = 7r2(121) = 9. 

Let w S A* be such that 'Kp{w) € N. The partial function Vp : A* — >■ 
{0, . . . ,p — 1}* mapping w onto pp{'Kp{w)) is called the normalization func- 
tion. Thanks to a result of C. Frougny, the graph of this function is regular 
whatever the alphabet A is 0. For the case we are interested in, the lan- 
guage 

Up^ = {{u,v) I u G {1,... ,pYQ\v G S;, 1u| = \v\,T:p{u^) = TTp{v^)] 

is the reversal of the graph of the normalization funciton mapping the p-adic 
representation of an integer onto its p-ary representation. The trim minimal 
automaton (the sink has not been represented) of Vp^ is given in Figure 



and is clearly permutation free. So thanks to Theorem 17, is star-free. 



(1,1),..., (p-1,0) 
(p-l,p-l) (p,l) (0,0) 




(l,2),...,(p-2,p-l) 



Figure 4. From p-adic to p-ary representation. 
Lemma 24. A language L C S* is star-free if and only if is star-free. 
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Proof. Let u,v,w G S*. Assume that L is aperiodic, for n large enough 

□ 

Thus is a star-free language. 

We denote by pi and p2 the canonical homomorphisms of projection, if A 
and B are sets, pi : Ax B ^ A : {a,b) a and p2 ■ A x B ^ B : {a,b) ^ b. 

Lemma 25. Let L C S* be a star-free language. Then the language 
L(BT* = {{x,y) \xeL,ye T*,\x\ = \y\} 

is also star-free. 

Proof. Let u,v,w € {T, x T)*. Since L is star-free and pi and p2 are letter- 
to-letter (length preserving homomorphisms), we have 

uv'^weLer* ^ pi{uv"w) e l k P2{uv'^w) ev* 

<^ pi(n)pi(i;)>i(u;) G L & P2{u) P2{vrP2{w) G T* 

^ Pi(u)pi(v)"+V(m^) € L & P2(u)p2(^^)"^^P2(tf^) G r* 
^ pi(n^;"+iu;) GL&j52(^xf"+^«^) er* 



□ 

Naturally, we can also define the language F* © L in a similar way. 

Generally, the homomorphic image of a star-free language is not star-free 
p^ , p. 12] but the following weaker result holds. 

Lemma 26. If a language L C (S x F)* of couples of words of the same 
length is star-free then pi{L) C S* and p2{L) C F* are also star- free. 

Proof. One can apply the same reasoning as the one given in the proof of 
the previous lemma. □ 

We are now able to prove the main result of this section (notice that a 
small mention to p-adic systems appears also in [11 1). 

Proposition 27. A set X C N is p-star-free if and only if the language 
made up of the p-adic representations of the elements in X is star-free. 

Proof. If Pp{X) is a star-free language then thanks to Lemma |2^, 

{0,... ,pr(Bpp{x) 

is star-free. Thanks to Lemma ^ and since the family of aperiodic languages 
is closed under boolean operations, the language 

L= [{0,... ,p}* ®ppiX)]ni7p 

is again star-free. To conclude the first part of the proof, we apply Lemma 
|2^ , pi{L) is star-free and it is clear that this language is exactly made up of 
the p-adic representations of the elements in X. 
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Conversely, let M C {1,... ,p}* be such that 7rp(M) = X. If M is star- 
free then thanks to Remark || and using the previous lemmas, 

P2[(o*M©{o,... ,p-ir)nup] 

is star-free. □ 
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